CASH: Revisiting Hardware Sharing in Single-Chip Parallel Processors
نویسندگان
چکیده
As the increasing of issue width has diminishing returns with superscalar processor, thread parallelism with a single chip is becoming a reality. In the past few years, both SMT (Simultaneous MultiThreading) and CMP (Chip MultiProcessor) approaches were first investigated by academics and are now implemented by the industry. In some sense, CMP and SMT represent two extreme design points. In this paper, we propose to explore possible intermediate design points for on-chip thread parallelism in terms of design complexity and hardware sharing. We introduce the CASH parallel processor (for CMP And SMT Hybrid). CASH retains resource sharing a la SMT when such a sharing can be made non-critical for implementation, but resource splitting a la CMP whenever resource sharing leads to a superlinear increase of the implementation hardware complexity. For instance, sparsely used functional units (e.g. dividers), but also branch predictors and instruction and data caches, can be shared among several “processor” cores. CASH does not exploit the complete dynamic sharing of resources enabled on SMT. But it outperforms a similar CMP on a multiprogrammed workload, as well as on a uniprocess workload. Our CASH architecture shows that there exists intermediate design points between CMP and SMT.
منابع مشابه
A Case for Software Managed Coherence in Many-core Processors
Processor vendors are integrating more and more cores into their chip. These many-core processors usually implement hardware coherence mechanisms, but when the core count goes to hundreds or more, it becomes prohibitively difficult to design and verify efficient hardware coherence support. Despite this, many parallel applications, for example RMS applications [9], show little data sharing, whic...
متن کاملHardware Support for Synchronized Shared Data on Multicore Processors
Multicore processors allow manufacturers to integrate larger numbers of simpler processing cores onto the same chip with few or no changes to the processing core architecture. These processors can simultaneously execute threads from separate processes (multiprogrammed workloads) or from the same multi-threaded application (parallel workloads). The design space for on-chip memory hierarchies inc...
متن کاملAlgebraic Models of Simultaneous Multi-Threaded and multi-core Microprocessors
Superscalar microprocessors execute multiple instructions simultaneously by virtue of large amounts of (possibly duplicated) hardware. Much of this hardware is idle at least part of the time. simultaneous multithreaded (SMT) microprocessors utilize this idle hardware by interleaving multiple independent execution threads. In essence, a single physical processor appears to be multiple virtual pr...
متن کاملSynchronization and Pipelining on Multicore: Shaping Parallelism for a New Generation of Processors
The potential for higher performance from increasing on-chip transistor densities, on the one hand, and the limitations in instruction-level parallelism of sequential applications and in the scalability of increasingly complicated superscalar and multithreaded architectures, on the other, are leading the microprocessor industry to embrace chip multi-processors as a cost-effective solution for t...
متن کاملFocus: Multiparadigm Programming Focus: Multiparadigm Programming Focus: Multiparadigm Programming
THE COMPUTER INDUSTRY IS EXPERIENCING a major shift: improved single-processor performance via higher clock rates has reached its technical limits due to overheating. Fortunately, Moore’s law still holds, so chip makers use transistors to boost performance through parallelism. Modern chips consist of multiple microprocessors (also called cores), buses, and cache memory on the same chip. As of t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Instruction-Level Parallelism
دوره 6 شماره
صفحات -
تاریخ انتشار 2004